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Abstract Thisisastudy on the Avian coronavirus IBV and 
chicken host-relationship from the codon usage point of view 
based on fifty-nine non-redundant IBV SI sequences (nt 
1-507) from strains detected worldwide and chicken tissue- 
specific protein genes sequences from IB V-replicating sites. 
The effective number of codons (ENC) values ranged from 
36 to 47.8, indicating a high-to-moderate codon usage bias. 
The highest IBV codon adaptation index (CAI) value was 
0.7, indicating a distant virus versus host synonymous 
codons usage. The ENC x GC3 % curve indicates that both 
mutational pressure and natural selection are the driving 
forces on codon usage pattern in SI. The low CAI values 
agree with a low S protein expression and considering that S 
protein is a determinant for attachment and neutralization, 
this could be a further mechanism besides mRNA tran- 
scription attenuation for a low expression of this protein 
leading to an immune camouflage. 
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Introduction 


Infectious bronchitis virus (IBV) (Nidovirales: Corona- 
viridae: Coronavirinae: Gammacoronavirus: Avian coro- 
navirus) 18 an enveloped single-stranded positive sense 
RNA virus with circa 27 kb and 120 nm in diameter, with 
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20 nm spikes formed by trimers of the spike glycoprotein S 
(Cavanagh 2007; Thiel 2005). 

The spike glycoprotein is a class | fusion protein with 
the two major domains SI and S2, with a function in 
receptor attachment and membrane fusion, respectively, 
being the target for neutralizing antibodies and presenting 
an evolution so sensitively driven by host humoral 
immune-response that polymorphisms in 10-15 amino 
acids in SI might give rise to different serotypes of the 
virus (Cavanagh 2007; Thiel 2005). 

Besides S, IBV genome codes for 15 non-structural 
proteins in the ORF1 that occupies the 5’ one-third of the 
genome and are involved in viral replication and patho- 
genesis while the structural proteins E (envelope) and M 
(membrane) involved in virion stability and nucleoprotein 
(N), which associates to the genomic RNA forming the 
helical nucleocapsid, are coded by the remaining 3’ 7 kb 
(Thiel 2005). 

Different pathotypes of IBV exist that cause disease in the 
respiratory system, kidneys, reproductive tracts of both males 
and females and enteritis and the virus occurs worldwide with 
a massive diversity in terms of serotypes, genotypes, and 
geographic-specific lineages (Cavanagh 2007; Jones 2010) 
and thus studies on IBV molecular evolution must necessarily 
include data sets that represent such diversity. 

IBV and chicken host/virus relationship has been com- 
prehensively studied in terms of receptors, immune response, 
molecular epidemiology, and pathogenesis (De Wit et al. 
2011; Jones 2010; Winter et al. 2008; Yang et al. 2009), 
pointing toward a positive outcome to the virus and a highly 
negative one to the birds due to productive virus replication, 
rapid disease spread, severe tissue damage, and immune 
evasion that relies mainly on antigenic polymorphism. 

Nonetheless, in depth studies on virus and host rela- 
tionship must also take into account that not all 
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synonymous codons for a same amino acid occur in the 
same frequencies in an mRNA, but are rather used in 
varying frequencies, what is called codon usage bias. 
Synonymous codon usage for the Nidovirales has been 
shown to be virus-specific and conserved in a phylogenetic 
fashion with no host-specificity though (Gu et al. 2004). 

IBV as well as chicken codon usage bias have been 
studied already (Rao et al. 2011; Woo et al. 2007), but the 
diversity of IBV types and the use of chicken genes coding 
for tissue-specific proteins in an integrated way has not 
been considered thus far. 

In the view of the lack of information on the relationship 
between IBV and Gallus gallus, its natural host, from the 
codon usage point of view, the aims of this study were thus 
to understand the relationship of IBV and different chicken 
tissues based on codon usage bias analyses and to assess 
the forces that drive such a relationship. 


Materials and Methods 
DNA Sequences 


A survey was carried out in Genbank for IBV S1 sequences 
and the inclusion criteria were: geographic origin, patho- 
type, recent detection for field strains and vaccine strains 
related to archetypical IBVs; redundant sequences, L.e., 
those with nucleotide identities = 100 %, were excluded 
from the analysis. 

With these criteria, 59 sequences (Fig. 1) for nucleotides 
1-507 (regarding strain H120, GU393335) were used for 
the analyses. Though the inclusion criteria were intended to 
increase sequence diversity, the resulting sequences were 
shorter than the whole S gene due to a low availability of 
complete sequences for this gene, but sequences >150 nt 
are considered as statistically reliable for codon usage bias 
analysis (Gu et al. 2004). 

Chicken tissue-specific non-redundant genes sequences 
related to IBV replication sites were retrieved from Genbank for 
duodenum (cholecystokinin NM001001741.1), lung (surfac- 
tant, pulmonary-associated protein Al NM204606.1), kidney 
(vitamin D receptor NM205098.1), and oviduct (ovomucin «- 
subunit AB046524.1) and these same genes were also retrieved 
from the NCBI chicken genome resources (version 
GFC_000002315.3). G. gallus f-actin gene (L08165 and 
GFC_000002315.3) was included in the analyses as a refer- 
ence, ubiquitary-expressed gene. 


Relative Synonymous Codon Usage (RSCU) 


RSCU is the ratio between the observed number for a codon 
and its expected frequency under the random distribution of 
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all its corresponding isoacceptors and was calculated for 59 
codons (64 codons minus 3 stop codons and the codons for 
methionine and tryptophan with a single codon each) using 
MEGA 5.0 (Tamura et al. 2011) according to the equation 
RSCU; = x;/(2;X;/n), where x; is the total count for a given 
codon, XX; is the sum of the count for all isoacceptors 
related to that amino acid, and n is the number of possible 
isoacceptors for that amino acid. 

A RSCU > | means that a given codon is preferential; 
an RSCU < 1 means that a given codon is not preferential 
and if RSCU = 1 means that the given codon is neutral. 

In order to represent both host and virus codon usage pref- 
erences in a unique tree, the following algorithm was devel- 
oped: first, RSCU continuous variables for both G. gallus and 
IBV were converted to discrete binary data using 1 for 
RSCU > 1 (.e., a given codon is preferred for a specific amino 
acid) and 0 for RSCU < | (.e., the codon is not preferred or is 
neutral). Next, a matrix was built using the binary data for the 
presence or absence of that isoacceptor/allele as a preferred 
codon and used to build a Neighbor-joining tree (1,000 boot- 
strap replicates) using PAUP* 4.1b (Swofford 2000). 


Effective Number of Codons (ENC) 


ENC is similar to the effective number of alleles and 
measures the departure from the equal use of synonymous 
codons taking each isoacceptor as an allele and was cal- 
culated with ACUA (Vetrivel et al. 2007) with the equation 
ENCyetual = 2 + (9/F2) + C1/F3) + (5/F4) + (3/F6), 
where Fi is the average homozygosity (assuming equal use 
of each synonymous codon or allele) estimate for each 
class of degeneracy ranging from 2 to 6. ENC ranges from 
20 to 61, values closer to 61 meaning low bias (Wright 
1990). 


Natural Selection x Drift Test 


To test if the codon usage of IBV S1 is under natural 
selection or, conversely, mutation pressure is driving codon 
usage bias, expected and observed ENC and the corre- 
sponding GC3 % values were plotted in a same graphic. 
Expected ENC, meaning the expected codon usage if 
it’s influenced only by GC3 %, 1.e., the percent of G and C 
at the third position of all codons, was calculated as 
ENCexpec = 2 + 5 + 29[s* + (1 — s)*]~' (Wright 1990), 
where s is the GC3 %. Then, each ENCgxpec was plotted 
against each respective GC3 % and the actual ENCs were 
added to the graphic to measure its deviation from the 
expected values. If an ENC, ctu; plot lies on or just below 
the ENCgxpec curve, this might be interpreted as drift/ 
mutational bias and if a plot is distant from the curve, this 
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L08165.1 B-actin 
GCF_000002315.3 B-actin * NM001001741.1 CCK# 


86 [199 1 GCF_000002315.3 CCK * 
100 , NM205098.1 Vit D receptor’ 
GCF_000002315.3 Vit D receptor * 
AB046524.1 ovomucin alfa* 
100 ' GCF_000002315.3 ovomucin alfa # 
NM204606.1 SFPA* 


99 GCF_000002315.3 SPFA # 
AF093794.1 491 
100 54 © AF093793.1491 attenuated 


97 FJ899689.1 Conn32062 
FJ899690.1 Conn39528 
§3 35! FJ899691.1 Conn39691 
AY561711.1 M41 
60 99, GU393335 H120 
AY561713.1 Ma5 
X15832.1 D274* 
EU914938.1 MoroccanG/83* 


76 DQ901377.1 Iv497/02* 
DQ901376.1 UK/L633/04* 
100 DQ448275.1 USP10 


GU383091.1USP 54 
DQ355995.1 USP01 
DQ448273.1 USP02 

GU383075.1 USP38 
GQ169238.1 PRB0O1* 
GQ169239.1 PRO1* 
GQ169240.1 PRO2* 
GU383083.1 USP46 
GQ169243.1 PROG 
GU383071.1 USP34* 
53; UDQ448277.1 USP03 
GU383079.1 USP42 
90 GU383080.1 USP43 
GU383101.1 USP64 
GU383095.1 USP58 
GQ169242.1 PRO5S* 
DQ448276.1 USP12 
GU383073.1 USP36 
GU383094.1 USP57 
GU383109.1 USP72 
GU383108.1 USP71 
GU383103.1 USP66 
GU383105.1 USP68 
GU383084.1 USP47* 
GU383104.1USP67 
GU383106.1 USP69 
GU383110.1 USP73* 
83) DQ492308.1 USP05 
GU383100.1 USP63 
GU383097.1 USP60 
GU383096.1 USP59 
GU383107.1 USP70* 
JN022547.109622/10 Qxlike 
JN022545.109620/10 Qxlike 
os JN022546.109621/10 Qxlike 
EF079116.1 NLUL1449T/04 
DQ431199.1 L1148 
57] EF079118.1 FR/L1450T/05 
EF0O79117.1 FR/L1450L05 
EF079115.1 NLIL1449K/04 
544 EU914939.1 UK/AV2150/07 
FJ899688.1 Ark52930 
ar ad 00 | FJ899686.1 Ark51398 
—— FJ899687.1 Ark52052 
AF274435.1 DE072 
DQ901375.1 UK/L586/06 


Fig. 1 Neighbor-joining tree for binary data for preferred (/) or non- ENC (effective number of codons) values <40 and >45 are marked 
preferred/neutral (0) codons for 59 codons for 59 IBV S1 sequences with asterisk and hash, respectively; sequences with ENC between 40 
and 5 G. gallus genes (B-actin f-actin, CCK cholecystokinin, vit D and 45 have no marks. Number at each nodes are bootstrap values 
receptor vitamin D receptor, ovomucin alpha ovomucin «-subunit, (only >50 are shown) 


SPFA surfactant, pulmonary-associated protein Al). Sequences with 


means that natural selection is in action and GC3 % does _—_ expressed genes (Comeron and Aguadé 1998) and might be 


not follow genomic GC3 % (Wright 1990). used to estimate the adaptation of virus to host codons. 
A CAI < 1 means low fit and use of codons which are non- 
Codon Adaptation Index (CAT) preferred by the host while CAI = 1 means high vir- 


us x host codon fit. CAI was calculated with ACUA 
CAI is an estimative of the adaptation of synonymous  (Vetrivel et al. 2007) with a default set of highly expressed 
codons to a given expression system with a set of highly — G. gallus genes. 
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Results 
RSCU 


IBV S1 codons that differed from all host genes studied in 
non-variable amino acids positions were UUA (L169), GUA 
(V49), UCA (S93), and GGU/GGG (G39, 44, 45, and 89) 
for strains analyzed herein. AUA (J) was also exclusive to 
the IBV strains and not used by the host, but no 100 % 
conserved site for this amino acid was found amongst the 
sequences studied. The codon usage tree (Fig. 1) showed 
that all IBV strains segregated in a same cluster and close 
to lung codon usage, in an increasing distance from oviduct 
to kidney, duodenum, and the f-actin gene. Regarding 
specifically the IBV strains, the tree topology was similar 
to that expected for a SI nucleotide tree, with strains 
segregating according to established genotypes. 


Effective Number of Codons 


Observed ENC ranged from 36.04 (strain GQ169238.1IBV/ 
Brazil/PRBOIM) to 47.83 (strain EU914938.1 MoroccanG/ 
83) for IBVs (mean 42.79, sd 2.25). For G. gallus, mean 
observed ENCs were 33.59 (vitamin D receptor), 40.03 
(f-actin), 46.48 (cholecystokinin), 50.21 (surfactant, pul- 
monary-associated protein Al), and 53.03 (ovomucin «-sub- 
unit). Considering an ENC < 40 as indicative of bias, biased 
codon usage was found for IBV strains GU383110.1USP73C 
(39.94), GU383107.1USP70C (38.82), GU383084.1USP47C 
(38.21), GU383071.1USP34C (38.88), GQ169242.1IBV/ 
Brazil/PROSM (39.14), GQ169239.1IBV/Brazil/PRO1M 
(38.07), GQ169240.1IBV/Brazil/PRO2M (37.8), and GQ169 
238.1IB V/Brazil/PRBOIM (36.04), all Brazilian strains. 

On the other hand, strains X15832.1 D274, EU914938.1 
MoroccanG/83, DQ901377.1 It/497/02, and DQ901376.1 
UK/L633/04 from the Netherlands, Morocco, Italy, and the 
UK showed ENCs > 45 and grouped in a same cluster. 

No temporal pattern was found for ENCs, as IBV strains 
detected decades apart showed similar ENC values. 


Natural Selection x Drift Test 


The graphic with the expected ENC x expected GC3 % 
and the observed ENC x observed GC3 % (Fig. 2) showed 
that some IBV S1 and G. gallus plots are just below the 
expected curve, while others are more distantly below the 
curve, indicating that both mutation pressure and natural 
selection and are driving forces for the observed bias. 


Codon Adaptation Index 


IBV CAI values ranged from 0.64 (strain GQ169240. LIBV/ 
Brazil/PRO2M) to 0.7 (strain AF274435.1DE072) with a mean 
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Fig. 2 Expected (curve) and observed (points) effective number of 
codons (Y axis) and GC3 % (X axis) for 59 IBV S1 sequences (dots) 
and G. gallus f-actin, cholecystokinin, surfactant, pulmonary-associ- 
ated protein Al, vitamin D and ovomucin genes (asterisks) 


value of 0.66 (sd 0.01), while G. gallus genes CAIs ranged from 
0.71 (surfactant protein A gene NM204606.1) to 0.88 (vitamin 
D receptors NM205068.1 and GCF_0000023 15.3), indicating a 
moderately low adaptation of IBV S1 codons to G. gallus 
codons. 


Discussion 


It’s largely known that IBV strains, regardless their specific 
pathotypes, use the respiratory tract as a first replication 
site, from where they might spread to kidneys, reproductive 
system, and the gastroenteric tissues (Cavanagh 2007). 

02,3-Linked sialic acid, a membrane receptor for IBV- 
spike protein (Winter et al. 2008) is widespread in chicken 
epithelial cells, rendering a variety of cell types susceptible 
to IBV infection; it’s noteworthy that IBV of different 
pathotypes might show no differences in receptor prefer- 
ences (Abd El Rahman et al. 2009). 

Though this fact accounts for a successful replication in 
the respiratory epithelium and the virus spread to other 
organs, the events that come after virus attachment have 
been widely overlooked. 

IBV S1 codon usage based on RSCU values has been 
shown herein to be closer related to respiratory tract codon 
usage bias than to the values found in the oviduct, duo- 
denum, and kidney. This is evident in Fig. | as all IBV 
strains segregated together with the pulmonary chicken 
surfactant protein gene with a bootstrap of 90. From the 
molecular point of view, this means that IBV shares a close 
relationship with the respiratory tract when it comes to 
codons and consequently tRNA usage. 

The distant relationship between IBV and non-respiratory 
host tissues in terms of codon usage seen in Fig. 1 does not 
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mean that they are completely opposite, but that one have an 
ordered, increasing dissimilarity between the virus and the 
oviduct, kidneys, and duodenum, respectively. It’s interesting 
to speculate that a successful replication of IBV at the respi- 
ratory tract would allow for the emergence of a higher virus 
diversity and titre with an improved fitness to the other target 
tissues. 

It’s noteworthy that the most distant host genes in the 
tree are those from f-actin included as an ubiquitary- 
expressed gene, as the closer proximity of IBV to other 
host genes is a further evidence of a fine-tuned adaptation 
of the virus to some specific tissues. 

But essential disagreements between virus and host cells 
emerge from this analysis. RSCU values show that codon 
usage bias exists amongst the IBV strains studied with a 
low degree but in a conserved pattern for codons in seven 
positions. It’s noteworthy that the most frequent of these 
amino acids was glycine (positions 39, 44, 45, and 89), an 
amino acid with a short lateral chain that allows for a high 
sterical plasticity (Berg et al. 2002). 

Considering that residues 39, 44, and 45 are within 
antigenic domain I of Sl (Moore et al. 1997), the use of 
exclusive codons with no competition with the host would 
allow for the maintenance of an amino acid at certain 
strategic positions which sterical plasticity would be 
translated to a plastic protein structure for Sl, increasing 
the number of possible protein structures and contributing 
to the huge set of putative epitopes for S1 and the con- 
tinuous emergence of escape mutants. 

Serine, found as a conserved amino acid with an IBV- 
exclusively preferred codon (UCA) at position 93, is a 
hydrophilic amino acid in antigenic domain II of S1 (Moore 
et al. 1997) with a high propensity to turns in protein sec- 
ondary structure (Berg et al. 2002), what could also be 
important to keep structural stability in the proteic neigh- 
borhood for virus-cell attachment. 

Though the tree other 100 % conserved amino acids with 
IBV-exclusively preferred codons (L169, V49, and G89) 
are not located exactly inside antigenic domains, the high 
bias found for these must have some importance for the 
neighboring structures for aspects of the spike protein not 
related simply to antigenicity, but to the ignored face of the 
virus itself, such as protein stability. 

Valine and leucine, both non-polar, hydrophobic amino 
acids, take part more often in a-helixes and /f-sheets, 
respectively (Berg et al. 2002), and their maintenance at 
those respective positions might also have to do with S1 
globular structure stability. 

In this study, the highest IBV CAI was 0.7, which is 
bellow G. gallus lowest CAI (0.71 for surfactant protein A 
gene NM204606.1), evidencing that for all IBVs and for 
some G. gallus genes there’s a trend for a low CAI, with 
the consequent lower efficiency of protein synthesis (Sharp 
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and Li 1987), meaning that IBV-spike gene follows the 
trend shown by low-expressed genes of its host for a codon 
deoptimization-based regulation of translation. 

Codon bias is stronger in high than in low expression 
genes in terms of protein synthesis efficiency at the initia- 
tion step, meaning that the most 5’ nucleotides of any gene, 
as the S-region focused in this study, are more critical 
for protein synthesis efficiency. Thus, in genes with high 
expression, natural selection acts against codons changes, 
keeping the correspondence between codons and the tRNAs 
of higher availability (Bulmer 1991; Ridley 2004). 

Coronaviruses mRNA transcription happens in an atten- 
uated form from the smallest to the largest mRNAs from the 
3’ to the 5’ end of the genome, with smaller sub-genomic 
mRNAs being transcribed in higher amounts (Van Marle 
et al. 1995) and S is the second gene after ORF1, meaning 
that S is synthesized at a lower amount when compared to the 
other 3’ coronaviruses proteins. 

Spike protein is a major target for neutralizing antibodies 
and the presentation of this protein to the chicken immune 
system allows for the production of such antibodies. Thus, a 
lower amount of S favored by transcription attenuation 
would allow for a lower exposition to the immune system 
and a low CAI could make a still unknown but herein 
mathematically demonstrated mechanism that, associated to 
mRNA transcription attenuation, allows for a parsimonious 
spike protein synthesis and immune camouflage for IBV. A 
similar mechanism has been suggested for Pestiviruses as a 
consequence of a high number of underrepresented codons 
leading to decreased protein expression and a less intense 
host immune-response (Zhou et al. 2012). 

Furthermore, if a higher similarity between virus and host 
codon usage would allow for a higher viral protein expres- 
sion, it could be that a G. gallus codon-optimized attenuated 
IBV vaccine would result in an increased immune response 
due to a higher spike protein expression. 

An indication of a geographic pattern for codon usage can 
be noticed in Fig. |, as only Brazilian IBV strain showed the 
lowest ENC values. The significance of this finding in terms 
of virulence and immunity cannot be understood hitherto as 
no data on these parameters is available for these strains, but 
considering the high diversity of IB V in this country (Chacon 
et al. 2011; Villarreal et al. 2010), low ENCs could be a 
further mechanism for the emergence of escape mutants. 

The highest (>45) ENC values were found for strain from 
countries as distant as Morocco and The Netherlands, 
including the UK and Italy, but a very low number of 
sequences from these areas is available in the Genbank when 
compared to, e.g., Brazil; thus, instead of a geographic pattern 
for low codon usage bias in this case, 1.e., high ENCs, this 
could be primarily attributed to a lack of sequence diversity. 

The expected versus observed ENC x GC3 % graphic 
showed that natural selection is not acting alone on the 
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codon usage patterns of G. gallus (as already shown by Rao 
et al. 2011) and of the IBV strains under analysis, but in 
association with mutation pressure. As S is expressed in 
lower amounts when compared to other IBV proteins (as 
discussed above) and thus its synthesis relies on those 
tRNAs of lower availability in G. gallus cells, this could be 
the reason for the presence of drift in a nearly neutral 
evolution mode, 1.e., for some S1 sequences codons there’s 
no competition with host tRNAs and thus third positions 
nucleotides are not subjected to selection but might follow 
the whole genome GC % trend instead. 

As aconclusion, IBV types show a concerted codon bias 
for epitope-important amino acids on the spike protein with 
a general codon usage pattern of the virus closer to the 
respiratory tract than other replication sites driven by 
genetic drift and natural selection. 
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